Utility-based archiving

ABSTRACT

A system and associated methodology is provided that is adapted to infer what to do with an item, and more particularly whether to archive and/or keep active an item in a more active, easy-to-access store based upon a cost-benefit analysis. The cost-benefit analysis determines the overhead associated with keeping the item active (e.g., not archiving it) versus the gains in connection with having quick and easy access to the item. The cost of maintaining an item in an active state is measured in terms of the size of the item which, in turn, affects the amount of space needed to store it. The benefit of keeping the item active is measured in terms of a probabilistic determination describing how a user will access the item in the future, which is a reflection of the utility of the item in an accessible state. The invention leverages notions of temporal sensitivity of the likelihood that an item will be needed in the future such that determined values and inferences can be dynamically updated over time. Items having a small probability of being accessed again after an initial review are categorized as one-shot items.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation application of co-pending U.S.patent application Ser. No. 09/894,392, filed on Jun. 28, 2001 entitled“Utility-Based Archiving”, the entirety of which is incorporated hereinby reference.

TECHNICAL FIELD

[0002] The present invention relates generally to computer systems, andmore particularly to a system and associated methodology adapted toinfer how to store an item based upon a cost-benefit analysis related tomaintaining the item in an active, efficiently active state.

BACKGROUND OF THE INVENTION

[0003] Computer systems and related technologies have become a staple inmany aspects of modern society. People have come to rely on thesesystems as a tool for use in both personal and professional lives. Thesetechnologies have, among other things, provided for increasedcommunication and sharing of information among individuals and entities.For instance, computer systems and related technologies are currentlyused in conjunction with the Internet and local area networks to enablepeople to access, receive, generate and share unprecedented amounts ofdata (e.g., documents, spread sheets, presentations, Internet files andemail).

[0004] While individuals and society as a whole benefit from such freeflow of information, there are costs associated with managing such data.Moreover, these costs generally grow as the volume of shared informationincreases. Accordingly, since more and more information continues to becreated and circulated among a greater number of users, an important useof computer systems and related technologies is that of data management.

[0005] With respect to email messages, for example, an individual may beinundated with a large number of new email messages. As such, a user maybe required to spend significant time and energy reviewing, respondingto, organizing and/or sorting through these messages. Moreover, thenumber of email messages received is often inversely proportional to theamount of time available. For example, a manager who oversees manyemployees may have very little time to sort through, organize and/orrespond to a significant amount of email messages. However, such anindividual tends to fall within a group that receives adisproportionately large number of email messages. Some of the manager'semail messages may contain information that is of interest (e.g., statusreports for ongoing projects). As such, the manager may wish to reviewthese messages one or more times. Other messages may, instead, containirrelevant information (e.g., unsolicited junk mail). In this case, themanager may wish to spend as little time as possible dealing with thesemessages and, therefore, would prefer that these messages be discardedimmediately. Unfortunately, conventional systems generally do notprovide for automatically discarding irrelevant messages and/orprioritizing messages based upon relevance—the messages are not sortedbased upon their respective values and sizes. To date, systems andmethodologies merely archive messages based upon chronology. Moreover,these systems usually require some type of consent before items can bediscarded. As an example, a user is often asked whether it isappropriate to update a list of messages (e.g., whether to archive oneor more less relevant messages).

[0006] Since information will increasingly be propagated amongindividuals, managing of such information becomes more of an issue.While it may be desirable to indefinitely maintain all relevant items inan active state so that they can be quickly and easily accessed,technological realities limit the number of items that can be maintainedin such a state. More particularly, computer systems have memorylimitations. Fast memory that allows computer systems to maintain itemsin an active state is more limited than slower archival memory. Fastmemory is, therefore, generally more expensive than archival memory,making it more costly to store an item in an active state. Moreover,sorting through vast amounts of items in an active state becomes verylaborious (e.g., some individuals have thousands of e-mails in an activestate)—thus creating a need for streamlining of active items.

SUMMARY OF THE INVENTION

[0007] The following presents a simplified summary of the invention inorder to provide a basic understanding of some aspects of the invention.This summary is not an extensive overview of the invention. It isintended to neither identify key or critical elements of the inventionnor delineate the scope of the invention. Its sole purpose is to presentsome concepts of the invention in a simplified form as a prelude to themore detailed description that is presented later.

[0008] The present invention provides a system and methodology operableto infer or approximate what to do with an item. In one example, a valuedensity is used for item ordering, which yields an approximation forchoosing the best assortment of items of different sizes and values forstoring in a limited space (e.g. limited memory). The value density isobtained via a cost-benefit analysis to infer whether a user wouldprefer to archive, store as active or discard an item (e.g., e-mail,text, document, web page, image, audio). The invention provides forstreamlining the number of items stored as active in order to facilitateease of review, access and searching of items. In the cost-benefitanalysis, overhead attributable to keeping the item active is comparedto gains associated with having quick and easy access to the item. Avalue density is thereby obtained as a measurement of the worth of theitem given its size and utility. The value density provides a basis forcomparing one item to another such that a decision can be made as towhich items to maintain in an active state. Given a finite amount ofactive space within which to maintain items, items having greater valuedensities can be actively retained. In this manner, inefficientutilization of active space can be mitigated.

[0009] The system and associated methodology of the present inventionare adaptable to computer related applications including, but notlimited to, email and document retention systems. Since there is alimited amount of active memory on computer systems, the cost ofmaintaining an item in an active state is measured in terms of theitem's size. The benefit of keeping the item in an active state ismeasured in terms of a probabilistic determination that a user willaccess the item. The output of a probabilistic determination is areflection of the relevance and utility of an item. In accordance withan aspect of the present invention, notions of temporal sensitivity ofthe likelihood that an item will be needed in the future can beleveraged, such that probabilistic determinations can continually beupdated over time. As such, measurements of the utility of items andinferences drawn therefrom may be ongoing.

[0010] According to another aspect of the present invention, an item canbe classified as either an item that will be accessed more than once orjust a single time. An item that will be accessed only a single time isreferred to as a “one-shot” item. A one-shot item can be archived afterit is viewed since it is not likely to be accessed again. For example,if the item happens to be a short email message (e.g. “see you later”)sent in response to user initiated dialog by an entity with which theuser frequently corresponds, it may be regarded as a one-shot messageand be discarded or archived after it is reviewed. An item can bebranded a one-shot item based upon its determined probability and/orvalue density. For instance, if a determined probability/value densityis less than a threshold probability/value density, an item may beregarded as a one-shot item (e.g. probability item will be read morethan once is less than 0.5). Alternatively, an item may be branded as aone-shot item if its determined probability/value density changes bymore than a certain amount within a given period of time.

[0011] According to still another aspect of the present invention, alearning system can act upon an inference system to adjust inferencesmade thereby. For instance, the learning system can modify the mannerwithin which the inference system decides that an item is to be regardedas a one-shot item.

[0012] An interactive user interface (“UI”) is provided that allows auser to personalize how items are stored, including how probabilisticand inferential determinations are made.

[0013] To the accomplishment of the foregoing and related ends, certainillustrative aspects of the invention are described herein in connectionwith the following description and the annexed drawings. These aspectsare indicative, however, of but a few of the various ways in which theprinciples of the invention may be employed and the present invention isintended to include all such aspects and their equivalents. Otheradvantages and novel features of the invention will become apparent fromthe following detailed description of the invention when considered inconjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014]FIG. 1 is a schematic diagram illustrating a system that providesfor inferring whether to actively store an item in accordance with anaspect of the present invention;

[0015]FIG. 2 is a schematic diagram illustrating tables of informationthat may be stored in a property log in accordance with an aspect of thepresent invention;

[0016]FIG. 3 is an illustration of assortments of items stored in activeand archive item stores at different times, namely t1, t2 and t3 inaccordance with an aspect of the present invention;

[0017]FIG. 4 is a schematic diagram illustrating a system in a networkedenvironment that provides for inferring whether to actively store itemsfor multiple users in accordance with an aspect of the presentinvention;

[0018]FIG. 5 is a schematic diagram illustrating a system that providesfor inferring whether to actively store items and to optimize usage ofactive space in accordance with an aspect of the present invention;

[0019]FIG. 6 is a schematic diagram illustrating a system that includesa learning component operable to adjust inferences regarding whether toactively store items in accordance with an aspect of the presentinvention;

[0020]FIG. 7 is an illustration of an interactive user interface (UI) inaccordance with an aspect of the present invention;

[0021]FIG. 8 is a curve illustrating the probability that a user willaccess an item over time;

[0022]FIG. 9 is an alternative curve illustrating the probability that auser will access an item over time;

[0023]FIG. 10 is a flow diagram illustrating a methodology to inferwhether to actively store an item in accordance with an aspect of thepresent invention; and

[0024]FIG. 11 is a schematic block diagram illustrating a suitablecomputing environment in accordance with an aspect of the presentinvention.

DETAILED DESCRIPTION OF THE INVENTION

[0025] The present invention is now described with reference to thedrawings, wherein like reference numerals are used to refer to likeelements throughout. In the following description, for purposes ofexplanation, numerous specific details are set forth in order to providea thorough understanding of the present invention. It may be evident,however, to one skilled in the art that the present invention may bepracticed without these specific details. Moreover, well-knownstructures and devices are illustrated in some instances in blockdiagram form in order to facilitate description of the presentinvention.

[0026] The present invention relates to a system and associatedmethodology adapted to infer what to do with an item, and moreparticularly whether to archive and/or keep active an item based upon acost-benefit analysis. The cost-benefit analysis determines the overheadassociated with keeping the item active (e.g., not archiving it) versusthe gains in connection with having quick and easy access to the item.The cost of maintaining an item in an active state is measured in termsof the size of the item which, in turn, affects the amount of spaceneeded to store it. The benefit of keeping the item active is measuredin terms of a probabilistic determination that a user will access theitem, which is a reflection of the relevance and utility of the item.The invention contemplates temporal sensitivity such that determinedvalues and inferences can be dynamically updated over time.

[0027] Referring initially to FIG. 1, a system 100 is illustrated thatprovides for inferring whether to keep active, delete or archive items102 numbered 1 through N (N being an integer). An inference system 110receives the items 102 and makes an inference as to how a user wouldlike to store the respective items 102. The inference system 110 mayalso receive and utilize extrinsic data 104 to decide how to store theitems. An active item store 120 is employed to store items in an activestate, while an archive item store 130 can be utilized to retain certainitems in an archived state. Items in the active store are more readilyavailable, while items relegated to the archive store are more difficultto access. For instance, an email message maintained in the active storemay be presented to a user on a display (e.g., monitor), such that theuser can easily access it (e.g., double clicking with a mouse). In thearchive store, however, a user may be required to dig down throughmultiple archive files before being able to view a message, or only alink to an item stored offline may appear. Although common reference ismade herein to email messages, in accordance with the invention, itemscan be of any type susceptible to inference. For example, the items canbe e-mail, documents, web pages, news articles, images or soundrecordings. Examples of images include medical images such as microscopeimages, MRI images, X-rays, fingerprints, works of art, and videos, suchas might be taken by a robot going about a task. Examples of soundrecordings include music recordings and voice recordings.

[0028] The inference system 110 is operatively coupled to a property log150 adapted to store information relating to items, users and extrinsicdata. For email messages, for example, item information contained in thelog 150 may include, among other things, data revealed within emailheaders, such as the title/subject matter of the message, context (e.g.,when/where/how/circumstances under which it was created), messagepriority (e.g., whether it is urgent or of normal importance), the sizeof the message, whether there are any attachments and who sent themessage (e.g., an employer or an unknown entity that may send outunsolicited junk mail). User related information maintained in the logmay include name, age, title, job description, currentmedium/environment being utilized (e.g., office PC, car phone, personaldigital assistant “PDA”), the time, date and frequency of access to oneor more messages, and user preferences gathered from implicit evidenceobtained by monitoring user activity. Extrinsic data in the log mayinclude, but is not limited to, current date/time, news, headlines,holiday calendar dates, sporting events/announcements.

[0029] As shown in FIG. 2, some of this information may be stored in atabular or grid-like format in the property log. More particularly,properties 210 numbered 1 through M can be maintained for items 202numbered 1 through O (M and O being integers). Similarly, properties 220numbered 1 through P can be stored for users 230 numbered 1 through Q (Pand Q being integers). As time goes on, entries within the log can berefreshed. For example, the number of times that a particular useraccesses a certain item can be updated over time. Similarly, the mediumthat a user is engaged with can be refreshed over time (e.g., where theuser switches from his/her office PC to a car phone).

[0030] With reference back to FIG. 1, the inference system 110 isoperable to utilize the information relating to items 102, users and/orextrinsic data 104 to infer how to store items. The inference systemdoes this, in part, by determining probabilities associated with items.More particularly, a probability component 112 of the inference system110, calculates the likelihood that a user will access the items byapplying the information through one or more probabilistic techniques.This can be expressed and read as:

[0031] p (access |E); the probability that a user will access the itemgiven some evidence.

[0032] The probabilistic techniques employed may include, but are notlimited to, neural networks, naive Bayesian processing, sophisticatedBayesian processing, similarity analysis employing dot product and/orcosine function processing and decision tree processing. By determiningthe likelihood that a user will access an item, the probabilitycomponent 112 derives a metric indicative of the item's utility.

[0033] With continuing reference to FIG. 1, the probability component112 is operatively coupled to a cost-benefit component 114 of theinference system. The cost-benefit component 114 is operative tocalculate a value density based upon a determined probability and sizeof an item. As such, the value density is a measurement of the relativeworth of an item given its size. More particularly, the value densitymeasures the overhead attributed to keeping the item active versus thegains associated with having quick and easy access to the item. Thevalue density can be expressed and read as:${{{value}\quad {density}} = \frac{p\quad \left( {access} \middle| E \right)}{{item}\quad {size}}};$

[0034] the ratio of probability of user access and item size.

[0035] The inference system 110 can then utilize, among other things,determined probabilities and value densities to decide what to do withitems or approximate orderings. By way of example, one or more rules canbe utilized in the inference system. One such rule may indicate that anitem should be moved into the archive item store 130 if its determinedvalue density is below some threshold value. It is to be appreciated,however, that archiving rules can be applied to factors other thandetermined value densities. The use of value densities for ordering isbut one approximation for choosing the best arrangement of items tostore in a limited space. The inference system is operative to utilizeany suitable number of methodologies and/or models to infer what do withitems. Additional criteria may also be utilized to control other aspectsof items, such as the manner within which an item is presented. Forinstance, if user related properties reveal that the user is utilizing aPDA to access email messages, then the messages may be presented in aPDA ready format (e.g., appropriately reduced resolution).

[0036] By way of further illustration, the inference system can beconfigured to make decisions as to whether an item is likely to beaccessed only a single time and, if so, to brand that item as a“one-shot” item. To mitigate inefficient use of space, a one-shot itemcan be archived or discarded once it is accessed. In accordance with thepresent invention, the inference system 110 can examine items 102,extrinsic data 104 and/or information stored in the property log 150 tosurmise whether an item should be regarded as a one-shot item. Forexample, the system may infer that a short reply message, such as “seeyou later” that does not include any attachments, is sent by someonewith whom the user frequently corresponds and which is sent in responseto user initiated dialog should be considered a one-shot message.Alternatively, the system may deduce that an email message which is sentfrom a user's employer and which includes one or more attachments shouldnot be regarded as a one-shot message.

[0037] Determined probabilities and/or value densities can be evaluatedto gauge the status of an item. For instance, the results of aprobabilistic determination can be compared to a threshold probability.If the determined probability is less than the threshold probability(meaning that it is rather unlikely that a user will access the item),then an inference can be made that the item is a one-shot item.Similarly, a determined value density can be compared to a thresholdvalue density. If the determined value density is below the thresholdvalue density, an inference can be made that the item is a one-shotitem.

[0038] It is to be appreciated that temporal changes can be accountedfor in the present invention. By leveraging notions of temporalsensitivity of the likelihood that an item will be needed in the future,determined values and inferences can be dynamically updated over time.For instance, the probability component 112 can continually recalculateprobabilities of items as new items become available and as informationin the log 150 changes and is updated. The cost-benefit component 114then utilizes current probabilities to determine contemporary valuedensities. Ongoing inferences can then be made about what to do withitems. For instance, inferences can continually be made about whetheritems should be regarded as one-shot items. An item may, for example, beconsidered a one-shot item based upon a rate of decay or once itsdetermined probability falls below some threshold probability within apredefined period of time (e.g., five minutes). After an initial read,an email message may have a probability of being accessed again that,given the evidence and/or conditions, is adjusted downward to such adegree that it falls below some threshold value and should be archivedafter the initial read. Similarly, an item may be regarded as a one-shotitem if its determined value density drops below some thresholdprobability within a predefined period of time. The process can startwhen a message is received wherein an initial probability of access isdetermined and then decays over time.

[0039] An exemplary effect of the system over time is illustrated inFIG. 3, wherein, at a first time t1, items 2, 3, 7, 8, 11 and 12 arestored in an active item store 320 and items 1, 4, 5, 6, 9 and 10 aremaintained in an archived item store 330. At time t2, however, itemshave been rearranged based upon inferential determinations over timesuch that the active item store 320 contains items 1, 2, 4, 9, 10 and 11and the archived item store 330 contains items 3, 5, 6, 7, 8 and 12. Attime t3, the items have again been shuffled in accordance withtemporally adjusted determinations such that items 1, 4, 7, 8, 9 and 10are included in the active item store 320, while items 2, 3, 5, 6, 11and 12 are positioned in the archived item store 330. It is to beappreciated that, in addition to storing items, the present inventioncontemplates discarding and/or recalling items.

[0040] With reference now to FIG. 4, it is to be appreciated that thepresent invention has application to networked environments wherein Rnumber of users may receive and/or have access to S number of items 402(R and S being integers). The inference system 410 can determine howindividual users would like to store the items. In this manner, thenumber of items stored as active can be streamlined on a user by userbasis such that ease of review, access and searching of documents isfacilitated. In the example illustrated, for user 1, items 1, 4 throughT are stored in an active item store 420 and items 2, 5 through U arestored in an archived item store 430. In the same manner, for user 2,items 3, 4 through V are stored actively 422, while items 2, 7 through Ware archived 432 and for user R, items 5, 6 through X are activelystored 424, while items 1, 6 through Y are archived 434 (T-Y beingintegers). It is to be appreciated that, while separate active andarchived item stores are depicted, user based selections of items may bestored in a single active item store and a single archived item store(120, 130, FIG. 1). It is also to be appreciated that user specificselections can be updated over time. Although not shown, at a laterpoint in time, for instance, the items may be rearranged on a user byuser basis such that items 2, 7 through B, 4, 6 through C and 3, 4through C are stored actively, while items 4, 5 through E, 5, 8 throughF and 1, 9 through G are archived for users 1, 2, and R, respectively(B-G being integers).

[0041] With reference now to FIG. 5, in accordance with another aspectof the present invention, an optimization component 516 is includedwithin an inference system 510. The optimization component 516 mitigatesdifficulties associated with storing a plurality of items 502 numbered 1through H in a limited amount of active space (H being an integer). Moreparticularly, where several items are important enough to be maintainedin an active item store 520, the optimization component 516 is operableto decide the best or most useful assortment of items. In operation, theoptimization component may employ the use of utility directed knapsackmethodologies in its computations. Determined value densities and theamount of active space available are but some of the factors that may beconsidered by the optimization component. In particular, value densityto do an ordering is but one approximation of the knapsack algorithm forchoosing the best items of different sizes and values into a limitedspace (e.g., limited memory). Approximation has certain guarantees onoptimality (e.g., value no less than 0.5 optimal), but improvements canbe achieved with more computational resources, e.g., employ a completesearch (intractable) or limited search (more tractable) to find a bestfit. Furthermore, it should be appreciated that the general knapsackproblem of putting in items of different values and sizes has not beenapplied to the challenges noted herein with respect to the subjectinvention. Accordingly, in the example illustrated in FIG. 5, both acost-benefit component 514 and active item store 520 are coupled to theoptimization component 516. The optimization component obtains valueddensities from the cost-benefit component 514 and the amount of activespace available from the active item store 520.

[0042] A probability component 512 is also shown in the inference system510. The probability component 512 is operative to apply informationabout the items 502, users and extrinsic data 504 to probabilistictechniques to determine the likelihood that an item will be accessed.The probabilistic techniques employed may include, but are not limitedto, neural networks, naive Bayesian processing, sophisticated Bayesianprocessing, similarity analysis employing dot product and/or cosinefunction processing and decision tree processing. The probabilitycomponent 512 may obtain some of this information from a property log550 that stores such information. The cost-benefit component 514utilizes determined probabilities from the probability component 512 anditem size to calculate respective value densities of items.

[0043] The optimization component 516 determines if all of the itemsworth storing in the active space will actually fit within that space.Items typically warrant storage in active space if their value densitiesare above some predefined threshold. To determine if these items willfit within the active item store, the optimization component 516 isoperative to manipulate metrics in any suitable manner, such as byadding their respective sizes. The optimization component 346 can thencompare this sum to the size of the active item store. If all of theitems will not fit within the active item store, the optimizationcomponent can assess which items to relegate to an archive item store ordiscard. The optimization component 516 can make this determination inany of a variety of ways. For example, the optimization component 516may be adapted to find the arrangement of items that will maximize thetotal value density (e.g., the assortment of items have value densitiesthat, when summed, yield a maximum value). To effect this, theoptimization component can begin storing the items that have thegreatest value densities in the active space 520. This can be done in asequential manner until the next item in queue will not fit within theactive space. The remaining qualifying items can then be moved to thearchive item store 530 or discarded.

[0044] To further mitigate the inefficient use of the active space, theoptimization component can also examine the remaining items (e.g., thoseitems that have value densities above some predefined threshold, but arenot stored in the active space). If any of these items will fit withinthe residual amount of active space, they can be stored accordingly. Inthis manner, the number of actively stored items is streamlined tofacilitate ease of review, access and searching of items.

[0045] In the example illustrated in FIG. 5, items numbered 2, 5, 6, 8,9, 10 through J are shown in the active space 520 (J being an integer).Presumably, this is the collection that best mitigates the inefficientuse of active space. For example, as discussed above, this may be thearrangement of items that, when summed, yields a maximum value density.This grouping may also include one or more remaining items that just fitwithin the residual active space. For instance, the sum of items 2, 5,6, 8, 9, 10 may yield a maximum value density, while remaining item Jmay include a smaller item that just fits into the active item store.Items numbered 1, 3, 4, 7, 11, 12 through K are shown as being stored inthe archived item store 530 (K being an integer). This set containsthose items that are not useful enough (e.g., do not have sufficientvalue densities) to be stored in active space. It may also include oneor more items that warrant being stored in active space, but will notfit within the remaining active space. For instance, items 11 and 12 mayhave been the next items in queue, but had to be relegated to thearchive.

[0046] It is to be appreciated that temporal changes can be accountedfor such that the active space can be utilized efficiently over time. Todo so, the probability component 512 continually updates or recalculatesrespective probabilities of items as new items are added and/or asinformation is updated in the property log. The cost-benefit component514 can then utilize these probabilities to determine contemporary valuedensities. With updated value densities, the optimization component 516can constantly re-arrange the items that are stored in the active space520 and the archive 530.

[0047] Turning to FIG. 6, a learning component 660 may be operable toact on an inference system 610 and affect decisions made thereby. Thelearning system 660 can employ manual and/or automated means to analyzeinformation relating to items 602, users and extrinsic data 612, some ofwhich may be obtained from a property log 650. Conditional probabilitiesincluding, but not limited to, Bayesian statistical analysis can beutilized by the learning system. Results from computations performed bythe learning system can be utilized to create and/or adapt decisionsmade by the inference system. The learning system 660 can operatecontinually such that one or more inferences about how to store itemsevolve over time. For example, the manner within which the inferencesystem 610 recognizes one-shot items can be updated as the learningsystem observes user activity and notes the items that are accessed onlyonce and the circumstances surrounding such activity, includinginformation stored in the log 650.

[0048] Turning now to FIG. 7, an interactive user interface (UI) 700 inaccordance with an aspect of the present invention is illustrated. TheUI 700 allows a user to personalize how items are stored. In the exampleshown, the UI is adapted to display L number of conditions 702 thataffect how items are to be handled (L being an integer). Some of theentries are policies 710 that affect probabilistic determinations. Otherentries are archiving rules 720. Still others 730, 740 regard itemdiscard and memory utilization protocols, respectively. The UI isinteractive in that it provides for user enablement/disablement and/orcustomization. More particularly, in the example shown, activation boxesare included. A user can click a mouse on these boxes to enable/disablethe corresponding entries. For instance, activation box 752 next to thesecond policy 712 is checked. In this fashion, the second policy hasbeen enabled and the probability will, therefore, be reduced by acertain amount after a user definable period of time passes during whichthe message is not accessed. Similarly, box 754 next to the secondarchiving rule 722 is checked. Accordingly, the second archiving rulehas been enabled and therefore a message will only be archived once itexceeds a certain age. The UI also includes entry boxes that enable auser to configure entries. For instance, a user can type a value intoentry box 762 to specify the degree to which the second policy shouldreduce the probability based upon the age of the message. The age itselfis also user definable via box 764. Likewise, a user can type a valueinto box 766 to customize the second rule. It is to be appreciated thatnot all of the policies need to have the capability to beenabled/disabled and/or customized by a user. For instance, the thirdrule 724 shown in FIG. 7 is a firm rule that can neither beenabled/disabled nor customized. Although activation and entry boxes areillustrated, the present invention contemplates any suitable numberand/or type of user interface elements (e.g., slide bar elements, dropdown menus, dials, buttons, speech input/output elements). Additionally,it is to be appreciated that any suitable number of entries can beincorporated into a UI, such as that shown in FIG. 7. For instance,there are different manners in which to specify goals of an archivingsystem—per user preferences, and thresholds, cost-benefit analyses, andspecifications about free space (e.g., “leave at least x megs of freespace in the active store”, “archive any item with less than a p(accessagain |E) of 0.05”, “archive any item with less than a value density ofless than 0.1”).

[0049] Turning now to FIG. 8, since the present invention contemplatestemporal sensitivity, a plot 800 of the probabilistic determination foran item is illustrated over time. In the graph, the probability that theitem will be accessed given some evidence is plotted on the Y-axis andtime is plotted on the X-axis. The probability curve is illustrated asdecaying over time. The curve is drawn in this manner because empiricaltesting has provided support for the proposition that items become lessuseful over time and thus it is less likely that an item will beaccessed as time goes on. Accordingly, at time T0, which is the firstinstance of the item, the probability that a user will access the itemis at a maximum value, namely Pmax. At this point, the probability is,in all likelihood, high enough to warrant maintaining the item in anactive state. As time goes on, however, the likelihood that the itemwill be accessed decreases. Thus, at some later point in time, namelyTout, the probability that a user will access the item decreases to avalue, namely Pout, where it is no longer warranted to maintain the itemin an active state. At Pout, the probability is at a level that,considering the size of the item, the value density is likely below somethreshold level. At this point, an inference system may determine thatthe item no longer qualifies for storage in active space. In thisfashion, the item can be aged out of an active item store. The curveshown in FIG. 8 schematically represents modeling p(access >1) where T0starts at the time a message is first read.

[0050] With reference now to FIG. 9, a slightly different probabilityversus time curve 900 is shown. While the curve generally shows adecaying probability over time, some notable changes in the probabilityare also reflected. The curve initially has a probability Pinit at timeTinit. The probability then drops rather suddenly from Pinit to P1 in arelatively short period of time, namely from Tinit to T1. This maycorrespond to the situation where an email message is accessed andreviewed by a user shortly after it is received. The degree to which theprobability drops is a function of the policies in place and informationregarding the item, user and extrinsic data. For example, if the messageis a short reply message, such as “see you later” which is sent bysomeone with whom the user frequently corresponds and which is sent inresponse to user initiated dialog, the policies in place may cause theprobability of that message to drop significantly after the message isaccessed a first time. Alternatively, however, the situation is alsoillustrated in the curve wherein the probability increases slightly overtime, namely from probability P1 at time T1 to probability P2 at timeT2. This may, for example, occur where the email message is re-accessedseveral times by a user and the policies are configured to reflect thepremise that where a message is re-accessed several times, theprobability that the user will again re-access the message increasesover time. This may be the case, for instance, where the email messagecontains a code and/or password. A user may repeatedly access the emailmessage to retrieve the code and/or password until he/she has itmemorized. After the user has the code and/or password memorized, themessage is no longer re-accessed by the user, and therefore, theprobability will be reduced over time. Thus, the probabilitysubsequently falls to some value, namely Pthresh at time Tthresh,wherein it no longer warrants being maintained in active memory. At thispoint, the item can be aged out and archived or discarded. The curveshown in FIG. 9 could also apply to an item already in the archive whoseprobability of being accessed again is updated over time. For example,certain items—even those stored offline, might increase in p(accessagain) |E) where interest of user is learned with respect to aparticular topic(s), via aged email. Updates can continually be reviewedto facilitate making decisions about whether to bring back items intothe active store from the archive.

[0051] In view of the foregoing graphical, structural and functionalfeatures described above, a methodology in accordance with variousaspects of the present invention will be better appreciated withreference to FIG. 10. While, for purposes of simplicity of explanation,the methodology of FIG. 10 is shown and described as occurring serially,it is to be understood and appreciated that the present invention is notlimited by the illustrated order, as some aspects could, in accordancewith the present invention, occur in different orders and/orconcurrently with other aspects from that shown and described herein.Moreover, not all illustrated features may be required to implement amethodology in accordance with an aspect the present invention. It isfurther to be appreciated that the following methodology may beimplemented as computer-executable instructions, such as software storedin a computer-readable medium. Alternatively, the methodology may beimplemented as hardware or a combination of hardware and software.

[0052]FIG. 10 illustrates a methodology 1000 for determining what to dowith an item, and more particularly whether to archive or activelyretain the item. The methodology begins at 1002 wherein generalinitializations occur. Such initializations can include, but are notlimited to, allocating memory, establishing pointers, establishing datacommunications, acquiring resources, setting variables and displayingprocess activity. At 1004 a probability that the item will be accessedis determined based upon some evidence. The present inventioncontemplates that the evidence can include information about the item,user, and/or extrinsic data, where some of this information may beobtained from a property log. The probability that the item will beaccessed is an indication of the relevance and utility of the item.After 1004, the methodology proceeds to 1006 wherein a value density isdetermined. The value density is a measurement of the worth of the itemand may be determined by comparing the probability that the item will beaccessed to the size of the item (which affects the amount of space theitem will occupy). More particularly, the value density may bedetermined by finding the ratio of benefit of the item (measured interms of the probability that the item will be accessed) to the cost ofthe item (measured in terms of the size of the item). After 1006, themethodology then proceeds to 1008 wherein an inference is drawnregarding how to store the item (e.g., actively or not). This mayinclude an evaluation of whether or not the item is a one-shot item andmay be based upon conditions, some of which may be personalized. If theitem is to be stored actively, the methodology proceeds to 1010 wherethe item is put through an optimization protocol to ascertain if theitem should be stored in active space given the relative worth of theitem as compared to other items and the amount of space available. Acustomized allocation of fast memory may be considered in 1010. If it isdetermined that the item is to be stored in active space, themethodology proceeds to 1012 wherein the item is stored in an activeitem store. If the determination in 1008 or 1010 is negative, theprocess proceeds to 1014 wherein the item is archived or discarded.Since the present invention contemplates temporal sensitivity, after1012 and 1014 the method returns to 1002 so that the process cancontinually reassess whether the item is to be stored within active orarchival space.

[0053] In order to provide additional context for various aspects of thepresent invention, FIG. 11 and the following discussion are intended toprovide a brief, general description of one possible suitable computingenvironment in which the various aspects of the present invention may beimplemented. It is to be appreciated that the computing environment isbut one possible computing environment and is not intended to limit thecomputing environments with which the present invention can be employed.While the invention has been described above in the general context ofcomputer-executable instructions that may run on one or more computers,it is to be recognized that the invention also may be implemented incombination with other program modules and/or as a combination ofhardware and software. Generally, program modules include routines,programs, components, data structures, etc. that perform particulartasks or implement particular abstract data types. Moreover, one willappreciate that the inventive methods may be practiced with othercomputer system configurations, including single-processor ormultiprocessor computer systems, minicomputers, mainframe computers, aswell as personal computers, hand-held computing devices,microprocessor-based or programmable consumer electronics, and the like,each of which may be operatively coupled to one or more associateddevices. The illustrated aspects of the invention may also be practicedin distributed computing environments where certain tasks are performedby remote processing devices that are linked through a communicationsnetwork. In a distributed computing environment, program modules may belocated in both local and remote memory storage devices.

[0054]FIG. 11 illustrates one possible hardware configuration to supportthe systems and methods described herein. It is to be appreciated thatalthough a standalone architecture is illustrated, that any suitablecomputing environment can be employed in accordance with the presentinvention. For example, computing architectures including, but notlimited to, stand alone, multiprocessor, distributed, client/server,minicomputer, mainframe, supercomputer, digital and analog can beemployed in accordance with the present invention.

[0055] With reference to FIG. 11, an exemplary system environment 1100for implementing the various aspects of the invention includes aconventional computer 1102, including a processing unit 1104, a systemmemory 1106, and a system bus 1108 that couples various systemcomponents including the system memory to the processing unit 1104. Theprocessing unit 1104 may be any commercially available or proprietaryprocessor. In addition, the processing unit may be implemented asmulti-processor formed of more than one processor, such as may beconnected in parallel.

[0056] The system bus 1108 may be any of several types of bus structureincluding a memory bus or memory controller, a peripheral bus, and alocal bus using any of a variety of conventional bus architectures suchas PCI, VESA, Microchannel, ISA, and EISA, to name a few. The system1100 memory includes read only memory (ROM) 1110 and random accessmemory (RAM) 1112. A basic input/output system (BIOS), containing thebasic routines that help to transfer information between elements withinthe computer 1102, such as during start-up, is stored in ROM 1110.

[0057] The computer 1102 also may include, for example, a hard diskdrive 1114, a magnetic disk drive 1116, e.g., to read from or write to aremovable disk 1118, and an optical disk drive 1120, e.g., for readingfrom or writing to a CD-ROM disk 1122 or other optical media. The harddisk drive 1114, magnetic disk drive 1116, and optical disk drive 1120are connected to the system bus 1108 by a hard disk drive interface1124, a magnetic disk drive interface 1126, and an optical driveinterface 1128, respectively. The drives and their associatedcomputer-readable media provide nonvolatile storage of data, datastructures, computer-executable instructions, etc. for the computer1102. Although the description of computer-readable media above refersto a hard disk, a removable magnetic disk and a CD, it should beappreciated by those skilled in the art that other types of media whichare readable by a computer, such as magnetic cassettes, flash memorycards, digital video disks, Bernoulli cartridges, and the like, may alsobe used in the exemplary operating environment 1100, and further thatany such media may contain computer-executable instructions forperforming the methods of the present invention and or may containcomponents that are to be installed in accordance with an aspect of thepresent invention.

[0058] A number of program modules may be stored in the drives and RAM1112, including an operating system 1130, one or more applicationprograms 1132, other program modules 1134, and program data 1136. Theoperating system 1130 may be any suitable operating system orcombination of operating systems.

[0059] A user may enter commands and information into the computer 1102through one or more user input devices, such as a keyboard 1138 and apointing device (e.g., a mouse 1140). Other input devices (not shown)may include a microphone, a joystick, a game pad, a satellite dish, ascanner, or the like. These and other input devices are often connectedto the processing unit 1104 through a serial port interface 1142 that iscoupled to the system bus 1108, but may be connected by otherinterfaces, such as a parallel port, a game port or a universal serialbus (USB). A monitor 1144 or other type of display device is alsoconnected to the system bus 1108 via an interface, such as a videoadapter 1146. In addition to the monitor 1144, the computer 1102 mayinclude other peripheral output devices (not shown), such as speakers,printers, etc.

[0060] The computer 1102 may operate in a networked environment usinglogical connections to one or more remote computers 1160. The remotecomputer 1160 may be a workstation, a server computer, a router, a peerdevice or other common network node, and typically includes many or allof the elements described relative to the computer 1102, although, forpurposes of brevity, only a memory storage device 1162 is illustrated inFIG. 5. The logical connections depicted in FIG. 11 may include a localarea network (LAN) 1164 and a wide area network (WAN) 1166. Suchnetworking environments are commonplace in offices, enterprise-widecomputer networks, intranets and the Internet.

[0061] When used in a LAN networking environment, the computer 1102 isconnected to the local network 1164 through a network interface oradapter 1168. When used in a WAN networking environment, the computer1102 typically includes a modem 1170, or is connected to acommunications server on the LAN, or has other means for establishingcommunications over the WAN 1166, such as the Internet. The modem 1170,which may be internal or external, is connected to the system bus 1108via the serial port interface 1142. In a networked environment, programmodules depicted relative to the computer 1102, or portions thereof, maybe stored in the remote memory storage device 1162. It will beappreciated that the network connections shown are exemplary and othermeans of establishing a communications link between the computers 1102and 1160 may be used.

[0062] In accordance with the practices of persons skilled in the art ofcomputer programming, the present invention has been described withreference to acts and symbolic representations of operations that areperformed by a computer, such as the computer 1102 or remote computer1160, unless otherwise indicated. Such acts and operations are sometimesreferred to as being computer-executed. It will be appreciated that theacts and symbolically represented operations include the manipulation bythe processing unit 1104 of electrical signals representing data bitswhich causes a resulting transformation or reduction of the electricalsignal representation, and the maintenance of data bits at memorylocations in the memory system (including the system memory 1106, harddrive 1114, floppy disks 1118, CD-ROM 1122, and shared storage system1110) to thereby reconfigure or otherwise alter the computer system'soperation, as well as other processing of signals. The memory locationswhere such data bits are maintained are physical locations that haveparticular electrical, magnetic, or optical properties corresponding tothe data bits.

[0063] As used in this application, the term “component” is intended torefer to a computer-related entity, either hardware, a combination ofhardware and software, software, or software in execution. For example,a component may be, but is not limited to, a process running on aprocessor, a processor, an object, an executable, a thread of execution,a program and a computer. By way of illustration, both an applicationrunning on a server and the server can be a component. Additionally, asused in this application, “system” is a structure comprising one or moremodules. A “module” is a structure comprising computer hardware and/orsoftware. For example, a module can be, but is not limited to, acomputer readable memory encoded with software instructions or acomputer configuration to carry out specified tasks. By way ofillustration, both an application program stored in computer readablememory and a server on which the application runs can be module. Due tothe nature of modules, multiple modules can be intermingled and areoften not separated from one another. Systems can likewise beintermingled and inseparable. Likewise, it is to be appreciated that amodule can be a software object.

[0064] It is to be appreciated that various aspects of the presentinvention may employ technologies associated with facilitatingunconstrained optimization (e.g., example, back-propagation, Bayesian,Fuzzy Set, Non Linear regression, or other neural network paradigmsincluding mixture of experts, cerebellar model arithmetic computer(CMACS), Radial Basis Functions, directed search networks, andfunctional link nets) may be employed.

[0065] What has been described above includes exemplary implementationsof the present invention. It is, of course, not possible to describeevery conceivable combination of components or methodologies forpurposes of describing the present invention, but one of ordinary skillin the art will recognize that many further combinations andpermutations of the present invention are possible. Accordingly, thepresent invention is intended to embrace all such alterations,modifications and variations that fall within the spirit and scope ofthe appended claims.

What is claimed is:
 1. A system that stores an item comprising: a component that utilizes a formal probabilistic analysis to infer whether to store the item in an active item store or an archive item store.
 2. The system of claim 1, the component determines a value density for the item based upon a probability of user access of the item.
 3. The system of claim 2, the value density is the probability of access of the item given evidence divided by the size of the item.
 4. The system of claim 1, the component infers to store the item in the archive item store if a probability of user access is less than a threshold value.
 5. The system of claim 2, the component infers to store the item in the archive item store if the value density is less than a threshold value.
 6. The system of claim 2, the probability of user access of the item is time dependent.
 7. The system of claim 6, the time dependent probability of access decays over time.
 8. The system of claim 7, the decay is based on at least one of policies, and information regarding the item, a user, and extrinsic data.
 9. The system of claim 6, the time dependent probability of access increases over time based on user re-access of the item.
 10. The system of claim 1, further comprising a user interface that facilitates user control.
 11. The system of claim 10, the user interface facilitates control over policies that affect probabilistic determinations.
 12. The system of claim 11, the user provides an amount to reduce the probability of access by after the item is accessed via the user interface.
 13. The system of claim 11, the user interface facilitates reducing the probability over time by a selected amount after an amount of time selected by the user.
 14. The system of claim 11, the user interface enables a user to control a threshold probability of access more than once that defines a one-shot item.
 15. The system of claim 11, the user controls the amount by which to increase the probability when the item has an attachment.
 16. The system of claim 10, the user interface facilitates control over an archiving rule.
 17. The system of claim 11, the user controls a threshold value density and an item with a value density less than the threshold value density is stored in the archive item store.
 18. The system of claim 11, the user interface provides for the user to select a minimum age of the item that is stored in the archive item store.
 19. The system of claim 11, the user interface facilitates user selection of whether to archive one-shot items after they are read.
 20. The system of claim 10, the user interface provides user control regarding item discard protocols.
 21. The system of claim 20, the item discard protocol is a protocol that discards an item from a user selected sender.
 22. The system of claim 10, the user interface enables user control over memory utilization protocols.
 23. The system of claim 22, the memory utilization protocol is a user controlled number of items stored in the active item store.
 24. The system of claim 22, the memory utilization protocol is a user selected percentage ceiling of active space as the active item store.
 25. The system of claim 1, further comprising a learning system that analyzes at least one of the items, user data, and extrinsic data to adapt inferences performed via the formal probabilistic analysis.
 26. The system of claim 25, the learning system utilizes classifiers that consider distinctions between at least one of an address, an attachment, header information, and a body structure of the item.
 27. The system of claim 1, further comprising an optimization component that maximizes the total value density, the optimization component selects items with the greatest value densities to store in the active item store and examines remaining items to determine which fit into a residual amount of space within the active item store.
 28. A method for determining how to store an item comprising: utilizing formal probabilistic analysis to determine probability of user access of an item; determining a value density of an item based on the probability of user access; and inferring whether to store the item in an active item store or an archive item store based on at least one of the probability of user access and the value density of the item.
 29. The method of claim 28, further comprising storing the item in the archive item store based on at least one of the probability and the value density being less than a threshold value.
 30. The method of claim 28, further comprising decreasing at least one of the probability of user access and the value density over time based on at least one of policies, and information regarding the item, a user, and extrinsic data.
 31. The method of claim 28, further comprising increasing at least one of the probability of user access and the value density over time based on a user re-accessing the item.
 32. The method of claim 28, further comprising controlling policies that affect the formal probabilistic analysis via a user interface.
 33. The method of claim 32, further comprising selecting at least one of a probability reduction value for after an item is selected, a time based probability reducing amount, a threshold value to identify a one-shot item, and a probability increasing value corresponding to an attachment for the item.
 34. The method of claim 28, further comprising controlling an archiving rule by selecting at least one of a threshold density value below which items are stored in the archive item store, a minimum duration of time prior to storing an item in the archive item store, and whether to archive one-shot items subsequent to reading.
 35. The method of claim 28, further comprising controlling an item discard protocol via a user specifying senders of items to discard.
 36. The method of claim 28, further comprising controlling a memory utilization protocol via a user selecting a number of items to store in the active item store and a percentage ceiling of active space as the active item store.
 37. The method of claim 28, further comprising adapting the performed inferences via machine learning.
 38. The method of claim 37, learning utilizes classifiers that distinguish between items, the classifiers are built by watching a user re-access email over time and automatically training a filter that assigns an incoming message at least one of a time until future access and a probability of access.
 39. The method of claim 28, further comprising archiving an item to minimize latencies via maximizing the overall value densities of the items stored in the active item store.
 40. A system that archives items based on utility comprising: means for performing formal probabilistic analysis upon an item; means for determining value density of the item based on the results from the formal probabilistic analysis; and means for storing the item in an active item store or an archive item store based upon at least one of the formal probabilistic analysis and the value density. 